6 research outputs found

    All-rounder: A flexible DNN accelerator with diverse data format support

    Full text link
    Recognizing the explosive increase in the use of DNN-based applications, several industrial companies developed a custom ASIC (e.g., Google TPU, IBM RaPiD, Intel NNP-I/NNP-T) and constructed a hyperscale cloud infrastructure with it. The ASIC performs operations of the inference or training process of DNN models which are requested by users. Since the DNN models have different data formats and types of operations, the ASIC needs to support diverse data formats and generality for the operations. However, the conventional ASICs do not fulfill these requirements. To overcome the limitations of it, we propose a flexible DNN accelerator called All-rounder. The accelerator is designed with an area-efficient multiplier supporting multiple precisions of integer and floating point datatypes. In addition, it constitutes a flexibly fusible and fissionable MAC array to support various types of DNN operations efficiently. We implemented the register transfer level (RTL) design using Verilog and synthesized it in 28nm CMOS technology. To examine practical effectiveness of our proposed designs, we designed two multiply units and three state-of-the-art DNN accelerators. We compare our multiplier with the multiply units and perform architectural evaluation on performance and energy efficiency with eight real-world DNN models. Furthermore, we compare benefits of the All-rounder accelerator to a high-end GPU card, i.e., NVIDIA GeForce RTX30390. The proposed All-rounder accelerator universally has speedup and high energy efficiency in various DNN benchmarks than the baselines

    Deep Partitioned Training from Near-Storage Computing to DNN Accelerators

    No full text
    In this paper, we present deep partitioned training to accelerate computations involved in training DNN models. This is the first work that partitions a DNN model across storage devices, an NPU and a host CPU forming a unified compute node for training workloads. To validate the benefit of using the proposed system during DNN training, a trace-based simulator or an FPGA prototype is used to estimate the overall performance and obtain the layer index to be partitioned that provides the minimum latency. As a case study, we select two benchmarks, i.e., vision-related tasks and a recommendation system. As a result, the training time reduces by 12.2~31.0% with four near-storage computing devices in vision-related tasks with a mini-batch size of 512 and 40.6~44.7% with one near-storage computing device in the selected recommendation system with a mini-batch size of 64. CCBY1

    SEMS: Scalable Embedding Memory System for Accelerating Embedding-Based DNNs

    No full text
    Embedding layers, which are widely used in various deep learning (DL) applications, are very large in size and are increasing. We propose scalable embedding memory system (SEMS) to deal with the inference of DL applications with a large embedding layer. SEMS is built using scalable embedding memory (SEM) modules, which include FPGA for acceleration. In SEMS, PCIe bus, which is scalable and versatile, is used to expand the system memory and processing in SEMs reduces the amount of data transferred from SEMs to host, improving the effective bandwidth of PCIe. In order to achieve better performance, we apply various optimization techniques at different levels. We develop SEMlib, a Python library to provide convenience in using SEMS. We implement a proof-of-concept prototype of SEMS and using SEMS yields DLRM execution time that is 32.85x faster than that of a CPU-based system when there is a lack of DRAM to hold the entire embedding layer. © 2022 IEEE.FALS

    Shape control of nanostructured cone-shaped particles by tuning the blend morphology of A-b-B diblock copolymers and C-type copolymers within emulsion droplets

    No full text
    Block copolymers (BCPs) under colloidal confinement can provide an effective route to produce nonspherical particles. However, the resulting structures are typically limited to spheroids, and it remains challenging to achieve a higher level of control in the particle shape with different symmetries. Herein, we exploit the blend of BCPs and statistical copolymers (sCPs) within emulsion droplets to develop a series of particles with different symmetries (i. e. Janus-sphere and cone-shaped particles). The particle shape is tunable by controlling the phase behavior of the polymer blend consisting of a poly(styrene-block-1,4butadiene) (PS-b-PB) BCP and a poly(methylmethacrylate-statistical-(4-acryloylbenzophenone)) (P(MMAstat- 4ABP)) sCP. A key strategy for controlling the phase separation of the polymer blend is to systematically tune the incompatibility between the BCP and sCP by varying the composition of the sCPs (.4ABP, mole fraction of 4ABP). As a result, a sequential morphological transition from a prolate ellipsoid, to a Janus-sphere, to a cone-shaped particle is observed with the increase of.4ABP. We further demonstrate that the shape-anisotropy of cone-shaped particles can be tailored by controlling the particle size and the Janusity, which is supported by quantitative calculation of the particle shape-anisotropy from the theoretical model. Also, the importance of the shape control of the cone-shaped particles with high uniformity in a batch is demonstrated by investigating their coating properties, in which the deposited coating pattern is a strong function of the shape-anisotropy of the particles
    corecore